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TUMOUR MBT^flTJIflTff fPMF 
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Back ?rffvr r> 



30 



Metastatic spread of tumours fro. the site of 
Pnmary growth to distant organs, where seedling 

lZZn e f ° rnei hY disseminated ^ is the most 

endow tJ lmP ° rtant Pr0PM * ° f raaU9nant s. It 

endows .the community of tumour cells with the ability 

to s U r V i V e surgical excision of the p riinary growth . 
Also because metastases can themselves act as foci for 
further shedding and dissemination of tumour cells 
th ls p r ocess forms the basis for a g eome tric increase 

difficu °f ^ tUM0Ur ° n h ° St — increasing 

difficulty in clinical management, because of the wide 

"cTof t°h f is the h tUm ° Ur bUrd6n ' - - 

effect of this phenomenon on human health can be 

appreciated by reference to the mortality statistics 
Published by the Registrar General of the United , 
Kingdom. Approximately one in three of the population 
dxe of the consequences of metastatic cancer or are ' 

found to harbour asymptomatic metastatic tumour deposits 
at aut ^ tQ obtain which ^^^^ p 

helpful la early assessment of tumour prognosis or in 
Preventing the growth of already established metast es 

clilLn d±reCted ^ C ° ntroiii °' • -lor and 
und!\\ S19nifiCant problera - ^e following woric was 
undertaken as a contribution to such an endeavour 



Current 

lahn , In W ° rk recen "y conducted in the inventors 
laboratory it has been found that, if one is 
35 sufficiently persistent, it is feasible to transfer 
metastatic capability from human metastatic tumour 
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cells to non-metastatic mouse tumour cells, by 
transfection with genomic DNA from the metastatic 
population (Tarin 1988). On inoculation into nude mice 
the transfected cells make many metastatic deposits iji 
5 various organs. The new phenotype is stable through * 
many cell generations and can be transferred again in a 
second round of transfection, using DNA from metastases 
formed by the primary trans fee tants which we have 
introduced into fresh cells of the non-metastatic mouse 
10 cell-line. Subsequently it has been demonstrated in 
this programme of work that concomitant transfer of 
the donor DNA (of human origin) through both rounds of 
transfection, can be detected by several convergent 
lines of evidence, including Southern blotting, Alu-PCR 
15 and Xn Situ hybridisation (Hayle, Darling, Taylor and 
Tarin, 1993) using human Alu-specific probes with 
appropriate controls, still more recently this work 
has led to the isolation of clones containing human 
DNA, from the transfected metastatic cells by making a! 
20 genomic library of their DNA, in cosmids and screening 
it with human Alu specific probes. From one of the 
bacterial clones so identified it was possible to 
subclone a 2.9 Kb DNA Fragment that hybridises 
specifically to Southern blots of human DNA to identify 
25 a sharp homologous band suggestive of a sequence 

present in single or low copy number. This indicates 
that the homology is not to multiple iterative 
sequences, present in the human genome, which would 
have been expected to produce a smear. (It should be 
30 mentioned that, to visualise the band, non-specific- 
cross hybridisation of Alu repeats in the probe to ' 
counterparts in the target human DNA, was blocked with 
excess unlabelled Alu DNA prepared by PCP.) . 

The fragment has been sequenced and 
35 comparison of this information with entries in the ' 
GenBank/EMBL DataBank, indicates that it contains human 
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DNA which has not been previously recorded. Further 
analysis of the sequence by computer programmes to 
detect coding regions as well as by Northern blotting 
and by reverse transcription-polymerase chain reaction 
5 (RT-PCR) techniques, has provided converging lines of - 
evidence that parts of it are vigorously transcribed 
(expressed) in malignant human tumours and their 
metastases, but not comparably so in non-neoplastic 
txssue. The significance of this f inding is that ^ 

10 se quence has the potential to be a valuable probe for - 
the cc ra te assessment of the prognosis ^ patients 

with malxgnant tumours, by examination of a tiny biopsy 
sample or even a few cells obtained by f ine needle 
aspiration, and thus to influence therapy. 



15 



The Invon^ ?[) 

The invention provides the 2858bp DMA whose 
sequence (SEQ ID NO: 1) is shown in the Figure. ' 

The invention also provides a nucleic acid 

20 which codes for a protein which is expressed in 

malignant human tumours and their metastases which 
nucleic acid is selected from: the 2858b P DNA whose 
sequence (SEQ ID NO: 1, is shown in the figure 
degenerated and allele variations thereof, fragments 

25 thereof, longer DNA chains comprising any of these and 
DNA which hybridises to any of these. 

Gxnr . • nUCle±C 3Cid bG inc °rP°rated into an 

Th e"re° s n s VeCt ° r ' "* ^ ^ ^ * ^oor g ^ sn . 
The expression vector and the transformed microorganism 
30 constitute further aspects of the invention. 

„f *h , / n an ° ther 3SPeCt ' th€ invention Provides use 
of the defined nucleic acids or derivatives or 

fragments thereof for the identification, preparation 
or isolation of the nucleotide sequence or portions " 
35 thereof coding for a protein which is expressed in 

malignant human tumours and their metastasis. Thus the 
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inventor intends to proceed with blotting, PGR and 
library screening techniques, to search for related 
flanking sequences and cDNA clones, m this way it is 
hoped to recover stretches of human DNA which are worth 
5 testing in functional assays to evaluate their 

metastatic inductive capability. These experiments may 
include ^introduction of the defined expression 
vectors into non-metastatic tumour cell lines. 

The invention also provides a method of 
10 investigating metastasis which method comprises 

.obtaining a sample of cells, and analysing the sample 
for the nucleic acid of the 2858bp nucleic acid- 
fragment or for a complementary RNA sequence. This 
analysis may preferably involve the use of reverse 
15 transcriptase to form cDNA corresponding to RNA of the 
sample; amplifying the cDNA, e.g. by the polymerase 
chain reaction; and performing a hybridisation assay 
of the amplified DNA using as a hybridisation probe a 
fragment or the whole of the defined DNA. 

20 „f „ * ., ThS SamPle ° f CellS ^ be a Clinicai «-Pl« 

of body fluid (e.g. blood, urine, sputum or stool, or • 

body tissue (e.g. tumour tissue, of a patient. The 
sample may be a histological section which is probed 
using a fluorescent or other labelled probe for mRNA 
25 corresponding to the 2858bp nucleic acid fragment 



EXPeT"i,mftnt ? 1 

Computer analysis has indicated that the 
30 sequence contains sections with characteristics 
signifying high probability that they are coding 
regions. Several studies were performed on this 2 9 Kb 
fragment to examine its informational content using 
various suites of programmes available via the Oxford 
35 University VAX cluster. These included looking for 

coding sequences by locating the positions of potential 
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start codons and by seeking stretches which have no 
stop codons. Further methods used included codon 
preference analysis (i.e. examination of whether the 
order of arrangement of purine and pyrimidine bases i s 
characteristic of coding sequences), as well as 
searches for probable splice junction sites and other ' 
more specialised techniques, to confirm that some of 
the open reading frames so detected are coding regions 
This information was used to design PGR primers to the 
boundaries of one of the coding regions which 
particularly attracted interest and with the RT-PCR 
technique showed that one could specifically amplify 
homologous mRNA sequences from RNA extracted from 
metastatic human tumour cell lines. The exact 
sequences of the primers used was as follows: 
PI 5 AATGACCCAGGAATGTCCAGGCCC (SEQ ID NO: 2) 
P2 5 -GAGGAGCACCTCACAGGCATCAAA (SEQ ID NO: 3) 
P3 5 'ACGTGTCGCAGAGCAGTGTGCTGT (SEQ ID NO: 4) 
P4 5 'TCTCACACCCATCTGGCTCCCACA (SEQ ID NO- 5) 
20 and the positions of these are marked on the sequence 
above . 



15 



Computer anfllvsH of fne Rftmicn „ a rf fhr ^ ^ 
fragment 

25 The se iuence was analysed using the Genetics 

Computer Group (GCG) package on the Oxford University 
molecular biology VAX cluster, the BLAST network 
service at NCBI and the mail servers Grail, Netgene and 
GenelD. The Grail mail server is trained to recognised 
30 Potential coding regions in human DNA; NetGene also 
uses a neural network to approach to predict splice 
sites in vertebrate genes; and the GenelD mail serve/ 
uses a hierarchical rule based system to recognise ; 
potential vertebrate coding genes. ""*, 
35 Database searches were made at Oxford against 

EMBL release 34.0 and SwissProt release 25 and at NCBI- 
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against the non redundant DNA database {containing EMBL 
release 34.0 and GenBank release 76.0) and the non , 
redundant protein database (containing SwissProt 
release 25, PIR release 36 and GenPept release 76) 

The DNA sequence was searched against the 
EMBL and Genbank databases using the GCG implementation 
of the PASTA program and the NCBI BLAST service to look 
for homologies to any known sequences. No homology to 
any known coding regions were found. At the 3" end a 
strong homology to a rodent Alu-like repetitive 
sequence was found, suggesting that the 3' end contains 
a rodent sequence. The remainder of the DNA fragment " 
contained scattered sequences with similarity to higher 
primate Alu repeats and several short segments with 
15 familial resemblances to sections of a variety of human 
genes, but no significant resemblances to rodent genes. 
This supports the Southern blotting data that the 
cloned sequence is mainly a portion of human genomic 
DNA retrieved from the mouse genome of the cells into 
which it was transfected. The sequence, translated in 
all six frames, was searched against the protein 
databases. No homologies to any known protein 
sequences were seen. 

The GCG program CodonPref erence was used to 
25 display potential open reading frames (i.e. stretches 
of sequence without a frame stop codon) ; and to 
predict the likely coding regions, based on the degree 
of codon bias shown towards a reference codon usage set 
of highly expressed human genes. The level of GC bias 
30 and codon usage bias were seen that corresponded to ' 
possible open reading frames (ORFs) . Among the most.* 
notable is the region from approximately bases 1650 to 
1800 in the 2nd reading frame of the reverse strand. 

The entire sequence was submitted to the 
35 NetGene, GenelD and Grail mail servers to detect 
potential splice sites, genes and exons. Grail 
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predicted three possible exons, one in the forward 
strand in frame 2 (between bases 536 and 942) and two 
in the reverse strand, in frames 1 (between bases 2143 
and 2398) and 2 (between bases 1625 and 1907). These 
5 , three regions all corresponded to exons predicted hy • ' 
GenelD and also to donor and acceptor sites found by : 
NetGene (see Table 2). All three exons fell within 
regions of higher than expected codon preference and GC 
bias as predicted by CodonPref erence analysis. The 
region around the possible exon in the second frame of 
the reverse strand was therefore the first one chosen 
for further study, being the one with the highest 
probability of being a coding region. 

The whole DNA sequence was also examined for 
15 potential transcription factor coding domains and 

binding sites by searching against the release 6.3 of 
the Ghosh database using GCG PindPatterns. Although 
some tentative matches were found a detailed" study of " 
the compositions of these and their locations in the 
20 three reading frames indicated that these were all very 
unlikely to be true transcription factor coding 
regions. The translated sequence was also searched > 
against release 10.1 of the Prosite database to search 
for potential DNA binding regions using the GCG program 
25 Motifs, but no homology to previously recorded regions 
could be identified. 

Investigation n f express ion 

Evidence that one of the putative coding 
30 regions identified by computer analysis in this 
fragment is expressed in neoplastic or metastatic 
tumour tissue, was provided by experiments using the ' " 
techniques of Northern blotting and RT-PCR. Northern 
blots of mRNA from metastatic cell lines A375M (the 
35 donor of the DNA used for the original transfection of 
metastatic behaviour) and 4A4 (a clonal line derived 



WO 94/28129 PCT/GB94/0U60 

- 8 - 

(Bao St fll , 1992) from- the human breast carcffitV cell 
line MDA-MB-435, probed with a 32 p labelled sample Qf 
the full 2858 base pair sequence showed specific - 
hybridisation to two small transcripts of approximately 
5 300bp size, but no comparable homology to mRNA from a 
virtually non-metastatic cell li„ e 2C5 cloned from . 
MDA-MB-435. 

Averse TrnnBrriptAon - Polymers p h ^ n Reaf >^^ w f ^- Prrt) 

10 Messenger RNA extracted from cell lines and 

soUd tissue samples was reverse transcribed with viral 
reverse transcriptase and the cDNA so obtained 
specifically amplified with primers Pi and P4 designed' 
to anneal to the outer ends of the putative coding 
15 region identified by computer analysis between base 951 
and 1233 on the reverse strand of the 2858 base pair 
complete sequence. Samples were also amplified using 
primers P2 and P4 . The PGR products were separated by 
gel electrophoresis in 1.6% agarose and stained with 
20 ethidium bromide for viewing in a U-V transiliuminator 
After photography the gels were blotted on to Hybond N + 
(Amersham International pic, nylon membranes and probed- ■ 
with p gamraaATP end-labelled oligonucleotide P3 ' " •'" 1 
After hybridisation the filters were washed and exposed 
25 to Kodak x-ray fil m for 2-10 hours, after which the 
film was developed. 

The PCR cycle parameters were as follows- 1 
period at 94 'C for 4 minutes, followed by 1 period at 
82 C for 2 minutes, during which time the Tag enzyme 

30 was added, followed by 30 cycles of 92'C for 30 

seconds, 60'C for 30 seconds and 70'C for 2 minutes. 

Contjroi studies to monitor the quality of 
roRNA and the success of cDNA synthesis in the RT-PCR " 
techniques were conducted using 2 pi aliquots from the 

35 same samples amplified with primers to the human p- 
actin gene (Clontech Laboratorie Inc., Palo Alto CA) 
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When blots of PGR products of cDNA obtained 
by reverse transcription of mRNA from these cell lines 
and amplified by primer pairs Pi and P4 and P2 and P4 
were probed with oligonucleotide P3 strong 
5 ™isation was seen to bands of the predicted sizes 
m the tracks containing samples from the metastatic 
cells (A375M and 4A4) and weak hybridisation to 
similar sized bands in the track containing sample from 
the virtually non-metastatic cell line t 2C5]. 
10 Evidence of expression of the coding region 

in tissues from human primary tumours and their 

metastases has also been ft ht^ n ^ 

■Lso oeen obtained using RT-PCR with the 

primers chosen, in a nroiu< n .. 

xn a preliminary survey of fresh 

samples from such lesions and from normal tissue 
15 counterparts (Table 1, disproportionately large 

Zr, ity / f SP6CifiC Pr ° dUCt —Poking to the 
amplified segment was observed in samples from " 

metastases and matched primary tumours from all 4 

malignant cases studied, m 9 samples from 

20 dT^T^ n0rmal tiSSU6S ° nly trace -P-ssion was 
detectable. This trace was not visible on ethidium 

oroide stained gels and required blotting and probing 
with 32 P Iabelled oligonucleotide p3 tQ be 
(Table 1) . 

25 Samples from 2 benign tumours showed very low 

expression (Table t). Collectively these results 
confirm that the coding region identified in the 
2858 bp cloned DNA fragment is expressed in the 
malignant tumours examined and indicate that homologous 

30 transcripts are present only in trace amounts in the 

non-neoplastic tissue samples. Expression was also low 
m the.benign (i.e. non-invasive non-metastatic)" - ■ 
tumours studied. 



35 
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TABLE 1 



RESULTS QF CLINICAL SAMPLES EYA^Tkied Pnp ffpm? 

EXPRESSION 



Patient 
number 

1 



Sample 



Lymph node metastases Breast carcinoma 

Primary Breast carcinoma 

Lymph node metastases Breast carcinoma 

Primar y Breast carcinoma 

Lymph node metastases Breast carcinoma 





Primary 


Breast carcinoma 


A 

H 


Lymph node metastases 

Primary 

Adenoma 


Colon carcinoma 
Colon carcinoma 
Colon 


5 


Primary 


Colon carcinoma 


6 


Fibroadenoma 


Breast 


7 


Fibroadenoma 


* 

Breast 


8 


Norma? 


Breast 


9 


Normal 


Breast 


10 


Normal 


Breast 


a 


Normal 


Breast 


12 


Normal 


Breast 


13 


Normal 


Colon 


14 


Normal 


Colon 


15 


Normal 


Colon 


16 


Oiverticulf tis 


Colon 



**+ Very Strong 
++ Strong 



+ Weak 
t Trace 
- Nothing 



MAGNA gene 
expression result 



0-actin 
expression 



++ 

++ 
+++ 

+ 

++ 

4-4-4- 
4-4- 



++ + 
+ + 

++ 



+ 

++ 

++ 



4-4*4- 



+■4-+ 



4*4* 



+4* 



4*4- 



4*4- 



+4-4* 



+4-4* 



+++ 



Useful cases: 

1) 9 non-neoplastic ii) 2 fibroadenoma in) 4 metastatic cancer 

tv) 1 non-metastatlc cancer v) 1 colonic adenoma (from patient 4 who Is also in 
Category Hi above) 

Footnote: 0-actin expression was determined in an aliquot from each sample as a 
control to evaluate quality of roflNA obtained from the sample. 



— 
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TABLE 2 

SUMMARY OF COMPFFTRP ANALYffXfi 

CODING RR^Tnwg 




BASE 

Forward Strand Frame 2 
539 

559 

560 

869 

870 

901 

Reverse Strand Frame 2 

1628 

1655 

1792 

1 793 

1906 

Reverse Strand Frame 1 

2146 

2149 

2389 

2390 

2397 



PROGRAM 



FEATURE 



Grail 

NetGene 

GenelD 

GenelD 

NetGene 

Grail 



Grail 

GenelD 

GenelD 

NetGene 

Grail 



Grail 

GenelD 

GenelD 

NetGene 

Grail 



Extent of ORF 
Acceptor Site 
Exon Start 
Exon End 
Donor Site 
Extent of ORB 



Extent of ORF 
Exon Start 
Exon End 
Donor Site 
Extent of ORF 



Extent of ORF* 
Exon Start 
Exon End 
Donor Site 
Extent of ORF' 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 



(i) APPLICANT: 

(A) NAME: ISIS INNOVATION LIMITED 

(B) STREET: 2 South Parks Road 

(C) CITY: Oxford 

(E) COUNTRY: United Kingdom 

(F) POSTAL CODE (ZIP): 0X1 3UB 

(A) NAME: TARIN, DAVID 

!c! r?5? ET A S° n I y Cotta 9 e ' 58 Tre e Lane, Iffley, 
(C) CITY: Oxford 

(E) COUNTRY: United Kingdom 

(F) POSTAL CODE (ZIP) : 0X4 4EY 

(ii) TITLE OF INVENTION: TUMOUR METASTASIS GENE 
(iii) NUMBER OF SEQUENCES: 5 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

<D) SOFTWARE: Patentln .Release #1.0, Version #1.25 (EPO) 

(Vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: GB 9311130 0 

(B) FILING DATE: 28-MAY-1993 

(2) INFORMATION FOR SEQ ID NO: 1: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2858 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: primer_bind 

(B) LOCATION: complement (964.. 987) 

(ix) FEATURE: 

(A) NAME/KEY: primer bind - . 

(B) LOCATION: complement (1091.. 11 14) 

(ix) FEATURE: 

(A) NAME/KEY: primer bind 

(B) LOCATION: 1141..T164 



1 r 
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( iX) FEATURE; 

(A) NAME /KEY: primer bind 

(B) LOCATION: 1206.. T229 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 1: 
TTCCAGCTCC ACCTCCCGAG TTGCTGGAAT TATAGGTGTC TGTCTGCCGC CACTCTCAGT 



! TGTCTGCCGC 


CACTCTCAGT . 


60 


> GGAGGCAATG 


CCCAAAAfCA 


120 


TCAAGCCCAA 


AACTACTCTG 


180 


CCTATTAAGC 


CTATAGGTGG 


240 


AAGTGCCTCC 


CCCCCACCAG 


300 


GGTTAATTCT 


AGAATGCGTA 


360 


TCATGAACTA 


GGCCATGATC 


420 


AGTTGGTACC 


TTTTTGATAG 


480 


TGGTGGTGAT 


CTCGCTAGCA 


540 


AAGGTAGGGG 


TAGACAGACT 


600 


GCCCTTTGGT 


GGCTAAAGAA 


660 


CAACAACATC 


CTGTCAGGAC 


720 


GACTATGTGG 


AAGAGAAGTT 


780' 


ATGAACATGG 


AAAGGGGGTC 


840 


CCAACAAGGA 


GAAGAGGTTT 


900 



CTAGAGTCAC ACAAATCTAA CAGAGCTGGG TACCTCTCAG AGATGGCTGC TAAGGTGGTG 
AGAAATGACC CAGGAATGTC CAGGCCCCAC CCCCATCCTG CAGGAGAGAA GTCCCTCCTC 
TCCTGATGCT CCCTCCTCCC TCTCCTGATG GTCCCTCCTC CCTCACCTCA TTCTCGGAAG 
AACTGGCAGA GAGGAGCACC TCACAGGCAT CAAAGAACTC GGTGTGGGAG TCGGCGAGGG 
ACAGCACACT GCTCTGCGAC ACGTGGGGGG TCAGCTCTCG GCCTTTCATG TACAGAGCTT 
CTTGCTGTGG GAGCCAGATG GGTGTGAGAC CTCAGAGGCC ACTGGAGTGA CAGACTTCCT 
GGAGTGGGAA CTATCACCCC CCACCCTCCT GCCAAGCAGA AGTAGCAAAA GAGAGGAAGA 
GCTTAAGGGA GAGGGAAAAT CTTGGACTTA GAAGAGAGGC TGGGCACCAA TAGAGCGTAG 
CTCCACCCTT CTCCTTGTTT GTTTTGTTTT GTTTTTTCTC TGTGTAGCTC TGGCTGTCCT 
CGGAACTCAC TTTGTAGACC AGGCAGGCCT AAAACTCAGA AATACCCTGC CTCTCCTCCT 



»l' 960 
1020 

1080 

1140 

1200 

1260 - 

1320 

1380 

1440 

1500 
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CTCAAGTTCT GGGATTAAAG GCGTGTGCAC CACCGCGGCC ACTCTTCTCC, TTCCTGACCC 
ACTCAGCTCG GAACCACACC CCATGGACAG GTGCAGTTAT GTCTCCACTT TGCAGATTAG 
AAGACTGAGG CTCAGAATAC AAGCTGGCAT GCACACCACC CTCAGACTCT AATTCAGCCT 
GGCTACTACT GAGGGTCCAT GAACCGGTCG ACTTAGTTAT TCTTTGGGJTT TTACGTTTTG 
TGATGCAGAT ATGTCTGACC TGTGGCCCAT GAGCTGTACA CAAATGAATG CAGACTAATXT 
CAAAATCATA AACTTACTCA AAACATTATG AAAATAGTTT GCACGAACTT TCTTTGTTGT 
TATTAAGTTG TTATACATTT TTGTTGGCTT GTTTTTTTGT TTTTTGGGAT TTTTTGTTTT 
TTTTTTTTTT TTGGTTTTTT TGAGACAGGG TTTCTCTGTG TAGCCCTGGC TGTTCTGGAA 
CTCAACTTTG TAGACCAGGC TGGCCTAAAG TCAGAAATCT GCCTGCCTCT GCCTTCCGAG 
TGCTGGGATT AACAGTAGGG CCACCACGCC CGGCTCCTTC TTTCTTTCTT TCTTTCTTCC 
TTTCTTTTTC GGTTTTTCAA GACAGGGTTC TGCTGTGTAG CCCTGGCTTT CCTGAACTCA 
GAAATCTGCC TGCCTCTGCC TCCCAAGTGC TGGGATTAAA GGCATGTGCA ACTGCCTGGC 
TTTTCTTTAT TTTGTGTTTT TTTTTAAATT TAATATTTAT TGTATGTGAG TACACTGTCA- 
CTGCTTCAGA CACACCAAAA GAGGGCGATC AGATCACATT ATAGATGGTT GTGAGCACCG 
ATGTGGTTGG TACTGAGAAT TAAACTCAGG ACCTCTGGAA GAGCAGTCAG TGCTCTTAAC - 
CACTTAGCCA TCTCTCCAGC CCTGTTTGTT TTTTCAAGAC AGAGTTTCTC TGTGTAGCCC 
TGGCTGTCCT AGAACCCACT CTGTAGACCA GGCTGGCCTC AAATTCAGAG ATCCACCTGC 
CTCTGCCTCC CAGGTGCTGG TCTACAGGGG AAGATTATGT TGTCCTTGGG TATGTCQTTA 
GGTAATGTCA AAGGCTGGAC AGGCCTGCTA AAGGGTAAGA ACCAACGCCT CACGGGCTCT 
GAAGTAAAAG GTAAAAATGT CCTCAGAAGC CAGAATATGG CTCAGATGCA GACTTCTGGC 
CTAGCATGCA AGGCCCTGTG TTCACGCCTC AGTACTACAA CCAACCCAAC CCAACCCAAC 
CCAACCCAAC CCAACCAACC CAACCCAAAA TATGATGCAC AAGCCATCTA CAGGAGCAGT 
CAAGAGAACT GTAGTGTTAT GTGAGAGAAA GGGAAGCT 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH; 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



AATGACCCAG GAATGTCCAG GCCC 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
GAGGAGCACC TCACAGGCAT CAAA 

24 

(2) INFORMATION FOR SEQ ID NO: 4: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
ACGTGTCGCA GAGCAGTGTG CTGT 

24 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) * 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
TCTCACACCC ATCTGGCTCC CACA 



< 

1 
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CLAIMS 

1 • The 2858bp DNA whose sequence is shown in 

the figure (SEQ ID NO: 1). 

2 * A nucleic acid which codes for a protein 

which is expressed in malignant human tumours and their 
-metastases, which nucleic acid is selected from: the 
2858bp DNA whose sequence is shown in the figure, 
degenerated and allele variations thereof, fragments 
thereof, longer DNA chains comprising any of these, and 
DNA which hybridises to any of these. 

3 - An expression vector comprising the nucleic 

acid of claim 1 or claim 2. 

15 4 • A transformed microorganism comprising the 

expression vector of claim 3. 

• S. Use of the nucleic acid of claim 1 or claim 2 

or derivatives or fragments thereof for the 
identification, preparation or isolation of a 
20 nucleotide sequence or portion thereof coding for a 
protein which is expressed in malignant human tumours 
and their metastases. 

6. A method of investigating metastasis which 
method comprises obtaining a sample of cells, and 

25 analysing the sample for the nucleic acid of claim 1 or 
claim 2 or for a complementary RNA sequence. 

7. A method as claimed in claim 6, wherein the 
sample of cells is a clinical sample obtained from body 
fluid or body tissue of a patient. 

30 8 • A method as claimed in claim 6 or claim 7,. 

which method comprises making cDNA from mRNA in the 
... sample, amplifying a. portion of the -cDNA comprising at 
least part of the DNA of claim 1, and detecting the 
amplified DNA. 

35 9 * A method as claimed in claim 8, wherein the 

cDNA is amplified by means of the polymerase chain 



******* 
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reaction using as primers 

P1 5 'AATGACCCAGGAATGTCCAGGCCC (SEQ ID NO: 2) or 
P2 5 'GAGGAGCACCTCACAGGCATCAAA (SEQ ID NO: 3) and " 
P4 5 'TCTCACACCCATCTGGCTCCCACA (SEQ ID NO: 5) . 
5 A probe which is a labelled oligonucleotide ' 

P3 5 'ACGTGTCGCAGAGCAGTGTGCTGT (SEQ ID NO: 4) 
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25 



30 



35 
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1 /2 

1 TTCCAGCTCC ACCTCCCGAG TTGCTGGAAT TATAGGTGTC TGTCTGCCGC • 
51 CACTCTCAGT TTATGCAGGG CTGGGGTCTG AACCCAGGGC TTTGTGCAAA 
101 GGAGGCAATG CCCAAAACCA CACTACACTC CCTACGTCCT CCACCATTTT 
151 TAGTAAAATG TCAAGCCCAA AACTACTCTG CCAATTCGCT CAAGTGGAAC 
201 CACCTGTCTC CCTGCCACAC CCTATTAAGC CTATAGGTGG AGGCCAGCGC 

-a' 

251 CACTCTCAAG CCTGGCCCAC CCCACCCCAG AAGTGCCTCC CCCCCACCAG ' - 
301 ATCCAGGTCC TCCACCGTAT TCCCCAACTC ATGGTTCCAA GGTTAATTCT 
351 AGAATGCGTA CCCAAAGCCA ATAGCCCACC AGACACAACA GACTGCCTTC 
401 TCATGAACTA GGCCATGATC AAACAGCTGC CCCCCACACA CACACACAGG 
451 TCCCCCATTC AGTTGGTACC TTTTTGATAG CGGTCAGCTC CCCTGATATC 
501 CAGCACCTCC TCAGACAGGC TGGTGGTGAT CTCGCTAGCA CAAGACTCTT 
551 CCTCCTCAGA ACCTGGGCGG GAAGAATTGC AAGGTAGGGG TAGACAGACT 
601 GCAATGCCCA GGACCTGGTA AGAATGTGCA TAAAACCCTA GCCCTTTGGT 
651 GGCTAAAGAA GGATGAGCAG GGAGGGGAGG AGCTTTTAGC CCTAAGACAA- 
701 CAACAACATC CTGTCACGAC GGGTACCGGA CTTATAGCAA AGAGCCTGGG 
751 AAATTGGCGA GACTATGTGG AAGAGAAGTT GATGGTGGCG GCGGAGATCC 
801 AGAGTCTGGG TCAAAGAAGC ATGAACATGG AAAGGGGGTC CAGGAAGGAT 
851 AACTTCAGAG AGCAGACAGG TAAGGCATGT CCAACAAGGA GAAGAGGTTT 
901 CTAGAGTCAC ACAAATCTAA CAGAGCTGGG TACCTCTCAG AGATGGCTGC 
951 TAAGGTGGTG AGAAATGACC CAGGAATfTP n««vw.>,. CCCCATCCTG 
1001 CAGGAGAGAA GTCCCTCCTC TCCTGATGCT CCCTCCTCCC TCTCCTGATG 
1051 CTCCCTCCTC CCTCACCTCA TTCTCGGAAG AACTGGCAGA ^AGGAGCAPP 

1101 TCACAGGCAT PAAARAAPTP GGTGTGGGAG TCGGCGAGGG AaSSoCT 
CGAGACGCTG TGCA 5'P3 s==a==a 
1151 GCTCTGCGAC APCTRT^r. TCAGCTCTCG GCCTTTCATG TACAGAGCTT 

ACACC CTCGGTCTAC CCACACTC7 5'P4 
1201 CTTGCTGTGG 6ASCGABATB nt^am^n CTCAGAGGCC ACTGGAGTGA " 

1251 CAGACTTCCT GGAGTGGGAA CTATCACCCC CCACCCTCCT GCCAAGCAGA 

1301 AGTAGCAAAA GAGAGGAAGA GCTTAAGGGA GAGGGAAAAT CTTGGACTTA 

1351 GAAGAGAGGC TGGGCACCAA TAGAGCCTAG CTCCACCCTT CTCCTTGTTT 

1401 GTTTTGTTTT GTTTTTTCTC TGTGTAGCTC TGGCTGTCCT CGGAACTCAC 

SUBSTITUTE SHEET (RULE 26) 



1451 TTTGTAGACC AGGCAGGCCT AAAACTCAGA AATACCCTGC CTCTCCTCCT 
1501 CTCAAGTTCT GGGATTAAAG GCGTGTGCAC CACCGCGGCC ACTCTTCTCC 
1551 TTCCTGACCC ACTCAGCTCG GAACCACACC CCATGGACAG GTGCAGTTAT 
1601 GTCTCCACTT TGCAGATTAG AAGACTGAGG CTCAGAATAC AAGCTGGCAT 
1651 GCACACCACC CTCAGACTCT AATTCAGCCT GGCTACTACT GAGGGTCCAT 
1701 GAACCGGTCG ACTTAGTTAT TCTTTGGGTT TTACGTTTTG TGATGCAGAT 
1751 ATGTCTGACC TGTGGCCCAT GAGCTGTACA CAAATGAATG CAGACTAATG 
1801 CAAAATCATA AACTTACTCA AAACATTATG AAAATAGTTT GCACGAACTT 
1851 TCTTTGTTGT TATTAAGTTG TTATACATTT TTGTTGGCTT GTTTTTTTGT 
1901 TTTTTGGGAT TTTTTGTTTT TTTTTTTTTT TTGGTTTTTT TGAGACAGGG 
1951 TTTCTCTGTG TAGCCCTGGC TGTTCTGGAA CTCAACTTTG TAGACCAGGC 
2001 TGGCCTAAAG TCAGAAATCT GCCTGCCTCT GCCTTCCGAG TGCTGGGATT 
2051 AACAGTAGGG CCACCACGCC CGGCTCCTIC TTTCTTTCTT TCTTTCTTCC 
2101 TTTCTTTTTC GGTTTTTCAA GACAGGGTTC TGCTGTGTAG CCCTGGCTTT 
2151 CCTGAACTCA GAAATCTGCC TGCCTCTGCC TCCCAAGTGC TGGGATTAAA 
2201 GGCATGTGCA ACTGCCTGGC TTTTCTTTAT TTTGTGTTTT TTTTTAAATT 
2251 TAATATTTAT TGTATGTGAG TACACTGTCA CTGCTTCAGA CACACCAAAA 
2301 GAGGGCGATC AGATCACATT ATAGATGGTT GTGAGCACCG ATGTGGTTGG 
2351 TACTGAGAAT TAAACTCAGG ACCTCTGGAA GAGCAGTCAG TGCTCTTAAC 
2401 CACTTAGCCA TCTCTCCAGC CCTGTTTGTT TTTTCAAGAC AGAGTTTCTC 
2451 TGTGTAGCCC TGGCTGTCCT AGAACCCACT CTGTAGACCA GGCTGGCCTC 
2501 AAATTCAGAG ATCCACCTGC CTCTGCCTCC CAGGTGCTGG TCTACAGGGG 
2551 AAGATTATGT TGTCCTTGGG TATGTCCTTA GGTAATGTCA AAGGCTGGAC 
2601 AGGCCTGCTA AAGGGTAAGA ACCAACGCCT CACGGGCTCT GAAGTAAAAG 
2651 GTAAAAATGT CCTCAGAAGC CAGAATATGG CTCAGATGCA GACTTCTGGC 
2701 CTAGCATGCA AGGCCCTGTG TTCACGCCTC AGTACTACAA CCAACCCAAC 
2751 CCAACCCAAC CCAACCCAAC CCAACCAACC CAACCCAAAA TATGATGCAC 
2801 AAGCCATCTA CAGGAGCAGT CAAGAGAACT GTAGTGTTAT GTGAGAGAAA 

2851 GGGAAGCT Length: 2658 

SUBSTfTUTE SHEET (RULE 26) 



